Measuring Global Similarity Between Texts
نویسندگان
چکیده
We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted experiments on several corpuses of texts. The experiments show that our methods can reliably identify different global types of texts.
منابع مشابه
Unsupervised Sparse Vector Densification for Short Text Similarity
Sparse representations of text such as bag-ofwords models or extended explicit semantic analysis (ESA) representations are commonly used in many NLP applications. However, for short texts, the similarity between two such sparse vectors is not accurate due to the small term overlap. While there have been multiple proposals for dense representations of words, measuring similarity between short te...
متن کاملFBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity
Recently, the task of measuring semantic similarity between given texts has drawn much attention from the Natural Language Processing community. Especially, the task becomes more interesting when it comes to measuring the semantic similarity between different-sized texts, e.g paragraph-sentence, sentence-phrase, phrase-word, etc. In this paper, we, the FBK-TR team, describe our system participa...
متن کاملMethods for measuring semantic similarity of texts
Measuring semantic similarity is a task needed in many Natural Language Processing (NLP) applications. For example, in Machine Translation evaluation, semantic similarity is used to assess the quality of the machine translation output by measuring the degree of equivalence between a reference translation and the machine translation output. The problem of semantic similarity (Corley and Mihalcea...
متن کاملMeasuring The Semantic Similarity Of Texts
This paper presents a knowledge-based method for measuring the semanticsimilarity of texts. While there is a large body of previous work focused on finding the semantic similarity of concepts and words, the application of these wordoriented methods to text similarity has not been yet explored. In this paper, we introduce a method that combines wordto-word similarity metrics into a text-totext m...
متن کاملThe Role of Local and Global Weighting in Assessing the Semantic Similarity of Texts Using Latent Semantic Analysis
In this paper, we investigate the impact of several local and global weighting schemes on Latent Semantic Analysis’ (LSA) ability to capture semantic similarity between two texts. We worked with texts varying in size from sentences to paragraphs. We present a comparison of 3 local and 3 global weighting schemes across 3 different standardized data sets related to semantic similarity tasks. For ...
متن کامل